Credit Card Users Churn Prediction

Project 5

by Anthony Amabile

Thera bank has seen a sharp decline in the number of credit card users. Credit card customers are very valuable to the bank. If a credit card customer leaves, this results in a loss for the bank.

This project undertakes the opportunity to:

  1. Provide an analysis for why that is be occuring.
  2. Develop a model to predict if a given customer will be leaving the credit card business line.

There are no duplicate values. There are some null and NaN values that will need to be examined and treated.

In addition, the string attribute columns of Attrition_Flag, Gender, and Marital_Status will need to be converted to dummy columns. We will revisit this when building our model.

The columns of Education_Level, Income_Category, and Card_Category will need to be converted to categorical numerical values before our model is created as well.

No row has more than 2 missing values in it.

However, roughly 21% of the rows have at least one null value.

For Education_Level (object) and Marital_Status (object),the mode will be inputed in place of null values.

All of the NaN values have been addressed.

Observations

  1. There are 10,127 customers in our data set.
  2. The median age for customers in the data set is 46.
  3. The typical customer has 2 dependents.
  4. The typical customer has a relationship with the bank of roughly 3 years.
  5. The typical customer has 4 products from the bank.
  6. The typical customer only has roughly 2 months of inactivity.
  7. The typical customer has roughly 2-3 contracts with the bank in the last 12 months.
  8. The typical customer has a credit limit of roughly $4,550.
  9. The typical customer has a revolving balance of roughly $1,276.
  10. The typical customer has a typically has about $3,474 on their credit card available to use.
  11. The typical customer has a Q4 to Q1 ratio of amount of transactions of roughly .74; spending is higher in Q1 compared to Q4.
  12. The typical customer has a 12-month transaction amount of $3,899.
  13. The typical customer has a total of 67 transactions.
  14. The typical customer has a Q4 to Q1 ratio number of transactions of roughly .70; more transactions are made in Q1 compared to Q4.
  15. The typical customer has an average utilization ratio of .18.

EDA

Univariate analysis

Observations on Customer Age

Observations on Dependent Count

Observations on Months on Book

Observations on Total Relationship Count

Observations on Months Inactive in the Last 12 Months

Observations on Contacts Count in the Last 12 Months

Observations on Credit Limit

Observations on Total Revolving Balance

Observations on Average Open to Buy

Observations on Total Amount Change Q4 to Q1

Observations on Average Utilization Ratio

Variables with a right skew include: credit limit, average open to buy, total amount change Q4 to Q1, and average utlization raito. The remaining variables have a relatively normal distribution.

Observations on Attrition Flag

Observations on Gender

Observations on Education Level

Observations on Marital Status

Observations on Income Category

Observations on Card Category

Bivariate Analysis

The columns of Education_Level, Income_Category, and Card_Category need to be converted to categorical numerical values.

Observations for Model building:

Observations for Customer Analysis:

Attrition Flag vs Months on Book

Attrition Flag vs Dependent Count

Attrition Flag vs Total Relationship Count

Attrition Flag vs Months Inactive (in the past 12 months)

Attrition Flag vs Contacts Count (in the past 12 months)

Attrition Flag vs Credit Limit

Attrition Flag vs Total Revolving Balance

Attrition Flag vs Average Open to Buy

Attrition Flag vs Total Amount Change Q4 to Q1

Attrition Flag vs Total Transaction Amount

Attrition Flag vs Total Transaction Count

Attrition Flag vs Total Count Chnage Q4 to Q1

Attrition Flag vs Average Utilization Ratio

Data Cleaning to do before Model Creation

The string attribute columns of Gender and Marital_Status will need to be converted to dummy columns.

Variables/columns to drop:

Variables/columns to normalize:

Dropping columns for our model.

Z Transformation to Normalize Skewed Variables

Creating Dummy variables

Model Building

Bagging Classifier Model

Random Forest Model

Decision Tree Model

Adaboost Classifier Model

Gradient Boosting Classifier Model

XGBoost Classifier Model

Building the Models using Oversampled Data

Bagging Classifier Model With Oversampled Data

Random Forest Model With Oversampled Data

Decision Tree Model With Oversampled Data

Adaboost Classifier Model Using Oversampled Data

Gradient Boosting Classifier Model Using Oversampled Data

XGBoost Classifier Model Using Oversampled Data

Building the Models using Undersampled Data

Bagging Classifier Model With Undersampled Data

Random Forest Model With Undersampled Data

Decision Tree Model With Undersampled Data

Adaboost Classifier Model Using Undersampled Data

Gradient Boosting Classifier Model Using Undersampled Data

XGBoost Classifier Model Using Undersampled Data

Summary of Model Performances

Three models that would probably benefit the most from tuning include:

These models all have decent performance, with good scores for some metrics and not so great scores for other metrics. These models are all slightly overfit. We will tune these models.

The three best models with the best test performance are:

We will tune these models as well.

Tuning Models for Improved Performance

Tunning the Bagging Classifier Model

Tuning the Random Forest Estimator

Tuning the Decision Tree with Downsampled Data

Observations:

Tuning the Best Performing Models

Tuning the Adaboost Classifier

Tuning the XGBoost Classifier

Tuning the XGBoost Classifier with Up Sampled Data

Observations:

The model with the best overall and generalized performance is the tuned XGBoost Classifier. The model doesn't overfit the data and can accurately classify which customers are most likely to leave within ~6%.

Productionalizing the Best Model

Conclusion - Insights and Business Recommendations

Insights

The typical customer is 46, married, has 2 dependents, has a graduate degree or higher, has an income of over $40k, has been with the bank for 3 years and has 4 products from the bank. The typical customer also has a credit limit of about $4,550, a revolving balance of $1,276, an available balance of $3,474, utilizes their card more in Q1 compared to Q4, has spends an average of $3,899 on their credit card per year, has about 67 credit card transactions and has an average utilization ratio of .17.

There is a large opportunity to incentivize customers to use their credit cards more. The more customers as a whole use their credit cards, the more customers will be retained.

Reccomendations

The company, Thera Bank, should target the following existing customers to retain on their credit card as they are highest risk of leaving their credit card:

Good retainment solutions include: